Implementation of VTLN for statistical speech synthesis

نویسندگان

  • Lakshmi Saheer
  • John Dines
  • Philip N. Garner
  • Hui Liang
چکیده

Vocal tract length normalization is an important feature normalization technique that can be used to perform speaker adaptation when very little adaptation data is available. It was shown earlier that VTLN can be applied to statistical speech synthesis and was shown to give additive improvements to CMLLR. This paper presents an EM optimization for estimating more accurate warping factors. The EM formulation helps to embed the feature normalization in the HMM training. This helps in estimating the warping factors more efficiently and enables the use of multiple (appropriate) warping factors for different state clusters of the same speaker.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Framework Of Feature Based Adaptation For Statistical Speech Synthesis And Recognition

The advent of statistical parametric speech synthesis has paved new ways to a unified framework for hidden Markov model (HMM) based text to speech synthesis (TTS) and automatic speech recognition (ASR). The techniques and advancements made in the field of ASR can now be adopted in the domain of synthesis. Speaker adaptation is a well-advanced topic in the area of ASR, where the adaptation data ...

متن کامل

Tper Hcaeser Pidi Implementation of Vtln for Statistical Speech Synthesis

Vocal tract length normalization is an important feature normalization technique that can be used to perform speaker adaptation when very little adaptation data is available. It was shown earlier that VTLN can be applied to statistical speech synthesis and was shown to give additive improvements to CMLLR. This paper presents an EM optimization for estimating more accurate warping factors. The E...

متن کامل

Vtln-based Rapid Cross-lingual Adaptation for Statistical Parametric Speech Synthesis

Cross-lingual speaker adaptation (CLSA) has emerged as a new challenge in statistical parametric speech synthesis, with specific application to speech-to-speech translation. Recent research has shown that reasonable speaker similarity can be achieved in CLSA using maximum likelihood linear transformation of model parameters, but this method also has weaknesses due to the inherent mismatch cause...

متن کامل

Combining Vocal Tract Length Normalization with Linear Transformations in a Bayesian Framework

Recent research has demonstrated the effectiveness of vocal tract length normalization (VTLN) as a rapid adaptation technique for statistical parametric speech synthesis. VTLN produces speech with naturalness preferable to that of MLLRbased adaptation techniques, being much closer in quality to that generated by the original average voice model. By contrast, with just a single parameter, VTLN c...

متن کامل

New method for rapid vocal tract length adaptation in HMMbased speech synthesis

We present a new method to rapidly adapt the models of a statistical synthesizer to the voice of a new speaker. We apply a relatively simple linear transform that consists of a vocal tract length normalization (VTLN) part and a long-term average cepstral correction part. Despite the logical limitations of this approach, we will show that it effectively reduces the gap between source and target ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010